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Abstract 

> 

■ We review some probabilistic properties of the sum-of-digits function of random inte- 

! gers. New asymptotic approximations to the total variation distance and its refinements are 

^ I also derived. Four different approaches are used: a classical probability approach, Stein's 

method, an analytic approach and a new approach based on Krawtchouk polynomials and 
the Parseval identity. We also extend the study to a simple, general numeration system for 
I which similar approximation theorems are derived. 
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MSG 2000 Subject Classifications: Primary 60F05, 60C05; secondary 62E17, 11N37, 11K16. 
^ I Key words: Sum-of-digits function, Stein's method, Gray codes, total variation distance, nu- 
meration systems, Krawtchuk polynomials, digital sums, asymptotic normality 



1 Introduction 

Positional numeral systems have long been used in the history of human civilizations, and the 
sum-of-digits function of an integer, which equals the sum of all its digits in some given base. 
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appeared naturally in multitudinous applications such as divisibility check or check-sum al- 
gorithms. Early publications dealing with divisibility of integers using digit sum date back to 
at least Blaise Pascal's CEuvres^ in the mid-1650's; see Glaser's interesting account [ ]. Nu- 
merous properties of the sum-of-digits function have been extensively studied in the literature 
since then; see Chapter XX of Dickson's History of Number Theory [32], which contains a 
detailed annotated bibliography for publications up to the early 20th century dealing with the 
digits of an integer, and properties discussed include relations between the digits structures be- 
tween n and n^, iterated sum-of-digits function, general numeral bases, etc. Modem reviews 
on digital sums and number systems can be found in Stolarsky's paper [ ' and the books by 
Knuth [ , §4.1], Ifrah [ ], AUouche and Shallit [\ Ch. 3], Sandor and Crstici [111, §4.3], 
Berthe and Rigo [12]. See also the two papers by Barat and Grabner [4] and by Mauduit and 
Rivat [94] for more useful pointers to several directions relevant to the sum-of-digits function. 
We are concerned in this paper with the distributional aspect of the sum-of-digits function of 
random integers. Many other types of results have been investigated in the literature and will 
not be reviewed here; most of these results deal with dynamical properties, exponential sums, 
Dirichlet series, block occurrences, Thue-Morse sequence, congruential properties, connec- 
tions to other structures, additivity, uniform distribution and discrepancy, sum-of-digits under 
special subsequences, etc. 

More precisely, let g ^ 2 be a fixed integer and n = XIo^jsja ^i^''' ^^ere £:j G {0, . . . ,g — 1} 
and A = [log^ nj . Then the sum-of-digits function h'q{n) of n in base q is defined as ^q<j<;^ Sj. 
When g = 2, we write ^{n) = z/2(n), which is the number of one's in the binary representation 
of n. 

Since the distribution of 
'n) is irregular in the sense 



Figure 1: i'2{n), n 



256. 




that its values can vary be- ^^2{n) 
tween (g — 1) [log^ n\ and 1 
(see Figure 1 for g = 2), we 
consider X„ = X„(g), which 
denotes the random variable 
equal to z/g(f/n), where [/„ 
assumes each of the values 
{0, . . . , n — 1} with equal 
probability 1/n. The be- 
havior of Xn is then more 
smooth. 

Obviously, when n = — 1, the distribution of X„ is exactly multinomial with parameters 
k and g identical probabilities 1/g. The general difficulty then lies in estimating the closeness 
between the distribution of X„ and a suitably chosen multinomial distribution. Periodicities are 
then ubiquitous in the study of most asymptotic problems involved. 

We will mainly review known results for the mean, the variance, the higher moments and 
the limit distribution of Xn, as well as related asymptotic approximations. It turns out that 
many of such results have been derived independently in the literature, and rediscoveries are 
not uncommon. As Stolarsky [ ] puts it 

"Whatever its mathematical virtues, the literature on sums of digital sums rejects 
a lack of communication between researchers." 



'Pascal's (Euvres is freely available on Wikisource. 
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In view of the large number of independent discoveries it is likely that we missed some papers 
in our attempt to give a more complete collection of relevant known results. 

In addition to reviewing known stochastic properties of X„, we will present new approxi- 
mations to the distribution of X„. For simplicity, we focus on the binary case q = 2, leaving the 
straightforward extension to other numeration systems to the interested reader. In particular, 
our results imply that the total variation distance between the distribution of X„ and a binomial 
random variable Yx of parameters A := [log2 n] and | is asymptotic to 



Figure 2: dTv(^(Xn), ^(1a)) - ^^^^ 



¥{Xn = k)-2- 



_ V2\F{\og,n)\ 
where, interestingly. 



A 



+ 0(A-^), 



log2n 



(1) 



F(log2 n) 



The function F is a bounded, periodic 
function (namely, F(x) = F{x + 1)) 
with discontinuities at integers; see 
(19) for the definition of F for arbi- 
trary X. 




We see that, up to the order 1 / ^/\og2 n, the total variation distance is essentially asymptotic 
to the absolute difference between the mean and A/2. Finer approximations will also be derived. 

Four different proofs will be given for clarifying the total variation distance, and each has 
its own generality; these include an elementary probability approach. Stein's method, Fourier 
analysis, and a new Krowtchuk-Parseval approach. Indeed, these approaches easily extend 
to the consideration of more general frameworks, a simple one being briefly considered that 
applies in particular to the number of one's in the binary-reflected Gray codes. 



1.1 First moment of Xn 

The mean of X„ is essentially the partial sum of i^q{j) 

:= nE(X„) = '^«(^')' 

Os£j<n 

which, by the relation z/q(gj + r) = z/g(j) + r for ^ r < g, satisfies the following recurrence 



n + r — 1 



{n ^ 2), 



with Sq{n) = for n ^ 1. In particular, when q = 2, this recurrence has the form 



Soin) = So 



+ S2 



+ 



n 

L2. 
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For many interesting recurrences for 5*2 (ri) , see [95] . Interestingly, the quantity ^2 (n) appeared 
naturally in a large number of concrete applications and is given as A000788 in Sloane's Ency- 
clopedia of Integer Sequences. A partial list when g = 2 is given as follows. 

• The number of bisecting strategies in certain games [ ] ; 

• Linear forms in number theory [ ]; 

• Determinant of some matrix of order n P"]; see also [' ] for an extension to g ^ 2; 

• Bounds for the number of edges in certain class of graphs [58, 61]; 

• The solution to the recurrence f(n) = maxfc{/(A;) + f{n — k) + min{A;, n — k}} with 
/(I) = is exactly 5*2 (n); concrete instances where this recurrence arise can be found 
in [59, §2.2.1] and [61,95]; see also [1]; 

• The number of comparators used by Batcher's bitonic sorting network [66]; 

• External left length of some binary trees [82]; 

• The minimum number of comparisons used by 

- top-down recursive mergesort [ ]; 

- bottom-up mergesort [ i u-+] ; 

- queue-mergesort [_.]; 

• The number of runs for the output sequence or recursive mergesort with high erroneous 
comparisons; see [60]. 

This list of concrete examples, albeit nonrandom in nature, shows the richness and diversity of 
the sum-of-digits function. 

Legendre, in his Theorie des nombres whose first edition was published in 1798, derived 
the relation 

Pqin) = n - (g - 1) ^ 

m 

see [ , Tome I, p. 12]. This relation has proved useful in establishing many properties con- 
nected to Uq (n) , including notably the identity (5) below. On the other hand, since ^ j> i \n / 
equals the g-adic valuation of n\ (namely, the largest power of g that divides n\), the above 
relation has also been widely used in the g-adic valuations of many famous numbers. For an 
extension of the right-hand side, see [iud]. 

About nine decades later, d'Ocagne [33] proved in 1886 an identity for Sq{n) for g = 10 
(see also [32, p. 457]); his identity easily extends to any base g ^ 2 and can be rewritten as 
follows. Write n = ^o<j<fc ^i^'"'' where €j = Sj{n) E {0, 1, . . . , g — 1}. Then d'Ocagne's 
expression is identical to 



n 

7 
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In particular, when g = 2, we can write n = Xli^jx^f ' where Ai > ■ • ■ > ^ 0, and (2) 
has the alternative form 



(3) 



where s = y2{n). An extension of this expression can be found in [ ' 07, 129]. Since the proof 
of d'Ocagne's expression is very simple (summing over all coefficients block by block), it has 
remained almost unnoticed in the literature. Similar expressions appeared and used in several 
later publications; see, for example, [11,22,47,60,80,82, 115, 129]. 

The first asymptotic result for E(X„) was derived by Bush [ ' '] about half a century after 
d'Ocagne's 1886 paper [ i3], and he proved that 



q-l 



log^ n, 



as n — )■ oo, inspired by an expression derived earlier in Bowden's book [13]^. Note that, by (2), 

{q-l)k + a-l 



E(X, 



Bush proved his formula by providing 
upper and lower bounds for the sum 
Yl,m<n ^3 i^) using the periodicity of 
Ej-. ej{m + q^~^^) = In partic- 

ular, when q = 2, ej{m) is a sequence 
starting with a series of 2-' zeroes fol- 
lowed by 2^ ones. His estimates imply 
indeed a more precise result (see Fig- 
ure 4 for q = 2) 



, Figure 4: | log2 n — E(X„ 



E(X„ 



log, n + 0(l), 




where the 0-term is optimal. Note that this estimate can also be derived easily from d'Ocagne's 
expression (2) by observing that the sum | (g— 1) XIosSjxa ^jJI^ provides the major contribution, 
the others being of order 0{n). 

Bellman and Shapiro [ ' ] were primarily concerned with the binary representation q = 2 
and provided an independent proof of Bush's result 

E{Xn) = Slogan + 0(log log n). 

They use two different proofs (one by generating functions and Tauberian theorems and the 
other by recurrence) and briefly mention in a footnote that the remainder can be improved to 
0(1). 

The same paper also initiated a very important notion called "dyadically additive", which 
has later on been fruitfully extended and explored mostly under the name of g-additivity (and 



We were unable to find a copy of this book. 
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its multiplicative counterpart g-multiplicativity); see [29,50, 116] for the early publications 
and [ ] and the papers cited there for more recent developments. 
Mirsky [ ], following [i i], proved that 



g-1 



log„n + 0(l). 



(4) 



His simple, half -page proof is based on the decompositions 



n 



0^j<n i^O 



n 



where f{n, i, r) denotes number of integers ^ j < n such that £e{j) = r. Then (4) follows 
from the simple estimate f(n,£,r) = n/q + 0{q^). 

Mirsky's result was independently re-derived by Cheo and Yien [ ] and Tang [ ] (judged 
to be virtually identical to [ ] in MathSciNet), and referred to as Cheo and Yien's theorem 
in [25, 71]. Cheo and Yien proved additionally in [22] a theorem for the density of X„ of the 
form 



P(X„ = m) 



1 (log, 



n] 



n 



ml 



for each finite m ^ 0. 

Drazin and Griffith [35] studied the sum of integer powers of the digits and derived esti- 
mates similar to (4). They also commenced the study of more precise numerical bounds for the 
0(l)-term in (4), which was followed later in [ 3, 43, 47-49, 95, 1 15, In particular, no 
mention is made in [ ^^,41,95, 1 ' ] of known results for the 0(l)-term in (4), and in particular 
the bounds derived in [ ] about half a century later are weaker than those in [35]. 

The next stage of refinement was accomplished by TroUope in 1968 where he showed that 
the 0(l)-term in (4) is indeed a periodic function when q = 2 for which an explicit expression 
is also given. His proof is based on d'Ocagne's formula (3), which he derived in [ ] in a 
more general setting. 

Delange [ ] made an important step towards the ultimate understanding of the underlying 
periodic function. He extended TroUope's result to any base q ^ 2 and showed, by a very 
simple, elegant, elementary proof, that (see Figure 4 for g = 2) 



E(X„) - 



where Fi{x) = Fi(a; + 1) is a continuous, 
periodic, and nowhere dijferentiable function 
(see also [127, 128]). His expression for 
Fi is as follows; see Figure 5 for a plot of 
—Fi{x) and its first few approximations by 
ilog2n-E(X„). 



log = Fi{\og n) 



(5) 



.Figure 5: —Fi{x): q = 2 



FAx) 



q-1 



I _ {x})+q'-^^^g{q 



where {x} denotes the fractional part of x and 
g{x) is a Takagi function [ ] 




0.2 0.4 0.6 0.8 
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9{x) = ^(1 ^h{q^x), 

with the 1 -periodic function h defined by 
(see Figure 6) 



Figure 6: h{x) for q = 2 (bottom), . . . ,6 (top) 



h{x) 



q{t} - {qt} 



q-1 



dt. 




Furthermore, the Fourier series expansion of F is also computed; see also [ ] for a sys- 
tematic approach by analytic means. Delange's proof is based on the simple observation that 



n 



- Q 



n 



n+l 



t 



- q 



t 



dt. 



(6) 



His paper [30] has since become a classic and has stimulated much recent research on various 
themes related to digital sums and different numeration systems; also different asymptotic tools 
have been developed. 

In particular, the TroUope-Delange formula (5) for E(X„), which is not only an asymptotic 
expansion but also an identity for all n ^ 1, is not exceptional but a distinguishing feature of 
many digital sums; see below and [45, 56, 127] for more examples. 



1.2 Beyond the mean: variance, higher moments and limit distribution 

The first paper dealing with the distribution of X„ beyond the mean value is by Katai and 
Mogyorodi [70] in 1968. They derived the asymptotic normality of X„ with a rate of the form 



sup 



p I ^''^ <x \ - 



- l)log n 



O 



log log n 



(7) 



where $ denotes the standard normal distribution function and the variance is implicit in their 
proof, namely, 

V(X„)~^^log,n. 

Their approach consists in decomposing X„ into sums of suitable number of independent ran- 
dom variables, each assuming the values {0, 1, . . . , g — 1} with equal probability. See (29) 
below for the binary case. 



About a decade later, Diaconis [ ] 
obtained, by Stein's method, an optimal 
Berry-Esseen bound for g = 2 of the form 



Figure 7: Tlie Kolmogorov distance x i/log 



sup 

X 



p 



Xn-\ log2 n 
\ log2 n 



< X 



(^ix) 



O 



yJ\ogn 



(8) 




n 
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he also proved that (q = 2) 



V(X„ 



n 



Moments of Stolarsky [ 1 23], in addition to giving a wide list of references, carried out a 
systematic study of the asymptotics of the moments of X„ when g = 2; in particular, he proved 
that 



n] 



(9) 



for any positive integer m. The 0-term is however too weak to obtain a more precise asymptotic 
approximation to the central moments of X„ of order ^ 2. 

Later Coquet [ ] in 1986 improved Stolarsky 's result by providing a formula of the TroUope- 
Delange type (g = 2) 



(1 \ 
^ + E (log2n)^F^,(log: 



(10) 



here Fmj are bounded, continuous, 1-periodic functions. Coquet's method of proof starts from 
defining 

and shows that the quantity S^^\2n) — 2S^^\n) is expressible in terms of sum of Sf in) with 
j < m; then an induction is used. In particular, his result for the second moment implies the 
identity 



loggn 
4 



+ F2{\og2n) 



0.30 



0.25 



0.20 



0.15 



0.10 



0.05 



Figure 8: —F2{x) 




X 



where F2 is continuous and periodic 
of period 1. Coquet [26] mentioned 
that the function F2 is nowhere differ- 
entiable and his proofs extend to any 
g-ary base. An independent proof of 
the above identity a la Delange was 
given later by Kirschenhofer [7. ]; see 
also Osbaldestin [103] for an interest- 
ing discussion of several digital sums, 
as well as an alternative expression for 
F2. 

On the other hand, Coquet's expressions for the Fm/s (except for F2) are nonconstructive; 
see [ ] for the third moment. Dumont and Thomas [ ] studied the moments of X„ in a 
general framework and derived more explicit expressions for F^j, as well as properties such 
as continuity and nowhere differentiability. Their approach relies on substitutions on finite 
alphabet and matrix analysis; see [ ]. In addition to the moments, they also considered in the 



-0.05 
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same paper [40] the moments of X„ — — 1) log^ n and showed that 

/ q-1 Y l + (-ir rnl f - I ^ ^ 

+ Yl (log,r^yF„^,i(log,r^) + o(l), 

0s;j<m/2 



(11) 



where the Fm/s are continuous, l-periodic, and nowhere differentiable functions. These es- 
timates imply of course the asymptotic normality of X„ by the method of moments, which 
Dumont and Thomas later established in [ ] (in a more general framework). 

Other constructive expressions, together with interesting functional properties, are derived 
by Okada et al. [100], based on binomial measures; see also [97] and the recent paper [78]. 
Extensions of the same approach to cover the moments of X„ for any q ^ 2 were carried out 
in [97, 98], the required tools being developed in [ ]. 

Unaware of Stolarsky's and Coquet's results, Kennedy and Cooper considered the cases 
when g = 10: m = 2 in [7 1 ] and any positive integer m in [ ] but with a non-optimal error 
term in the corresponding expression of (9) for q = 10; see also [i ]. The optimal error term 
follows indeed from Dumont and Thomas's result in [ ] (see also [ ]) and was later re-proved 
by Yu in [133] (see also [ ] for an extension). 

A general procedure, based on the classical approach of Dirichlet series and Mellin-Perron 
integral formula (fully discussed in [ ^ ]), was developed in [ ] and leads to absolutely con- 
vergent Fourier series expansions for Gm,j- The approach there can be easily extended to g-ary 
case. 



Probability generating function of X„. By definition, the probability generating function of 
Xn is given by 

The special cases when q = y = 2 appeared as the total number of odd numbers of (^) for 
^ i ^ j < n, a result derived by Glaisher [ . ] in 1899; earlier results of similar character can 
be found in the papers by Kummer [79] and by Lucas [85]. For another interesting occurrence 
in cellular automata, see [42, 131, 132]. 

The distribution of Xn is closely connected to the notion of g-additive and g-multiplicative 
functions, first introduced by Bellman and Shapiro [i i], and later systematically investigated 
by Gel'fond [ ] and Delange [ ]; see also [ , ""] and the references cited there. We did not 
find a more complete survey on g-additive or g-multiplicative functions but a simple search 
on MathSciNet resulted in more than 149 papers (as of December 16, 2012); see [12, Ch. 9] 
and[ ]. 

A function / : N — C is said to be q-multiplicative if 

/« + &) = /«)/(&), 

for 1 ^ a ^ g — 1 and ^ 6 < g*^', r ^ 1. This implies that /(O) = 1. Similarly, one defines 
g-additive functions by / [aq^ + h) = f{aq^) + f{h). By definition, one then obtains, for a 
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g-multiplicative function / (see [29, 50]) 

E /(^■) = E f n fi+ E /(^^^))) ( n /M^)) E /(^^^■)' 

where n = ^Q<j<^x^j1^ ■ taking f{n) = y'^'jC")^ which is obviously a g-multiplicative 

function, we obtain, by re-grouping nonzero summands, 

^ = ^ E /^""-^'^-^ (1 + y + ■ ■ ■ + y^^-^) (1 + y + ■ ■ ■ + 1/^-^)'^- , (12) 

where 

n = Ciq^' + C2q^^ + ■ ■ ■ + Csq^% 

with Xi > ■ ■ ■ > \s ^ and G {1, . . . , g — 1}. The closed-form expression (12) was later 
derived and stated explicitly by Stein [119]. 

Special cases of (12) appeared in Roberts ['09] for q = y = 2 (later re-derived in [123]), 
and in Stein [. . ,] for g = 2, which has the form 

E(l/^") = - E + 1/)'% (13) 

when n = 2-^1 + 2^2 _^ \_ 2^=^ where Ai > A2 > ■ ■ ■ > A^. 

In the same paper [119], Stein also obtained many bounds for the exponential sum (12); in 
particular, the function 

E(?/^") 

^(log, n; y) := iv > 0) 

is bounded and periodic (G{x; y) = G{x + 1; y)). 

Okada et al. [97, 101] later gave more explicit expressions for the periodic function G by 
multinomial measures. A different approach was provided in [ ]. A Fourier expansion for 
q = 2 was given in [56], which is absolutely convergent when \/2 — 1 < y < \/2 + 1. 

The closed-form expression (12) contains much information; for example, the d'Ocagne's 
formula (2) follows from (12) by taking derivative with respect to y = 1 and then substituting 
y = 1. We will see later that (12) is also helpful in proving effective approximations for 
distances between X„ and some binomials. 

For other approaches to g-additive and g-multiplicative functions, see [2, 36, 55, 88, 90, 91, 
99]. 

1.3 Asymptotic distribution of sum-of-digits function 

We mentioned Katai and Mogyorodi's ( [ ]) and Diaconis's ( [31]) Berry-Esseen bounds for 
Xn. We group here known results concerning limit and approximation theorems for X„ accord- 
ing to the major approach used, focusing mostly on the case g = 2 for simplicity of presentation 
and comparison. See Table 1 for a summary. 
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Authors & Papers 


Year 


Results 


Approach 


Notes 


Katai & Mogyorodi [ ] 


1968 


CLT+rate 


Elementary 


g-ary 


Diaconis [ ] 


1977 


CLT+rate 


Stein's method 


binary 


Schmidt [i 


1983 


Multivariate CLT 


Probabilistic 


binary 


Schmid[113] 


1984 


Multivariate LLT 
+rate 


Matrix GF 
Markov chain 


binary 


Stein [121] 


1986 


Binomial 
approximation 


Stein's method 


binary 


Dumont & Thomas [4u] 


1992 


CLT 


Method of moments 


general 


Loh [841 


1992 


Multinomial 
approximation 


vStein's method 


(7-arv 


Barbour & Chen [7] 


1992 


Approximation 
by a mixture 
of binomial 


Stein's method 


binary 


Grabner [ ] 


1993 


(implicit) 


Mellin transform 


g- additive 


Bassily & Katai [ ] 


1995 


CLT 


Method of moments 


g- additive 


Manstavicius [ ] 


1997 


Functional CLT 


Probabilistic 


g- additive 


Dumont & Thomas [ ] 


1997 


LLT+rate 


Markov chain 


general 


Drmota & Gajdosik [38] 


1998 


LLT+ rate 


Generating function 


general 


Drmota et al. [37] 


2003 


Functional CLT 


Probabilistic 


g-ary 



Table 1 : A summary of known approaches leading to the asymptotic normality ofXn,' here CLT 
denotes "central limit theorem " and LLT "local limit theorem ". 



1.3.1 Classical probabilistic approach 

Katai and Mogyorodi's approach uses elementary probability tools and relies their Berry- 
Esseen bound (7) on the following decomposition (for g = 2) 

P(X„ = £) = 1 ^ 2^^P(n^. = £ - J + 1), (14) 

which follows immediately from (13). Here Yj denotes the sum of j independent Bernoulli 
indicators, each assuming and 1 with equal probability 1/2. The identity (14) implies that the 
random variable X„ is itself a mixture of independent binomial distributions. The remaining 
proof then proceeds along standard classical lines (by using estimates for sums of independent 
random variables). 

Heppner [ ] later proved, in the same spirit, a simple Chemoff-type inequality for X„ 
when g = 2 (A = [log2 n\ ) 

P(|X„-(A + 1)/2| >C) ^2P(|Fa+i-(A + 1)/2| >C); 

since the right-hand side of this inequality decreases exponentially as C grows, one concludes 
that V2{fn) is close to (A + l)/2 for most m < n. This observation is useful in establishing 
precise estimates for sums of the form XlmeB„ ^2("^)» where -B„ is an arbitrary subset of non- 
negative integers < n. For results concerning the distribution of t'g(n) for given subsequences 
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of integers (such as prime numbers and squares), see [93,94] and the references therein. Similar 
estimates will be used below. 

A central limit theorem for the distribution of the values assumed by the sequence z/2(3r2) — 
h'2{n) was derived by Katai [u>], while the corresponding local limit theorem was given inde- 
pendently by Stolarsky [124]. The proof of Stolarsky's local limit theorem starts from matrix 
generating functions, obtaining a closed-form expression, and then applies the saddle-point 
method for the corresponding sum. Schmidt [ ] then proved, motivated by Stolarsky's 
[124] result, a multidimensional central limit theorem (the joint distribution of the values of 
h'2{Kin), . . . , U2{Kdn) for odd numbers i^i, . . . , K^) using tools from Markov chains. The in- 
tuition behind such a limit law is that the two events U2{KiUn) and V2{K2Un) are more or less 
independent, where f/„ ~ Uniform[0, n — \] and Ki, K2 are odd numbers. 

Dumont and Thomas [4 1 ] use again Markov chains and large deviations to characterize the 
asymptotic distribution of a class of digital sums (covering in particular X„) associated with 
substitutions, a Berry-Esseen bound being also derived. 



1.3.2 g-multiplicative functions 



Since Vq{n 



is g-additive, the function e**'^'?'^"^ is g-multiplicative. The distribution of the values 
of g-additive functions has been widely studied in the number-theoretic literature. We mention 
briefly an early result. Delange [29] showed that 



n 



+ 0(11 



for any g-multiplicative function / with |/| ^ 1 and (see [ ]) 

1 + f{qn + ■■■ + fiiq - W 



Um I I 

A;— s>oo -'- 



> 0. 



This result roughly says that the mean value of q-multiplicative functions with bounded modulus 
is close to some multinomial distribution. 

In particular, if one applies /orma//y this result to f{n) = e***^?*^"), then the left-hand side 
corresponds to the characteristic function of X„, while the dominant term on the right-hand 
side to a multinomial distribution. We cannot however conclude directly from this result that 
Xn is asymptotically multinomially distributed due to lack of uniformity in t. For asymptotic 
normality and related results for g-additive functions, see [9, 10, 88, 122] and [12, Ch. 9]. 



1.3.3 Stein's method 

Stein's method is a method of probability approximation invented by Charles Stein in 1972 
[120]. It does not involve Fourier analysis but hinges on the solution of a functional equation. 
In a nutshell. Stein's method can be described as follows. Let W and Z be random variables. 
In approximating the distribution ^(W) of W by the distribution .^{Z) of Z, the difference 
between ¥.{h{W)) and E(/i(Z)) for a class of functions h is expressed as 



12 



where L is a linear operator and fh a bounded solution of the equation 



L[f] = h-EihiZ)). 

The error E,(L[fh](W)) is then bounded by studying the solution fh and exploiting the prob- 
abilistic properties of W. The operator L has the property that E(L[/](Z)) = for a suffi- 
ciently large class of / and therefore characterizes J^{Z). Examples of L are (i) L[f]{w) = 
f'{w) —wf{w) for normal approximation, that is, if Z is the standard normal distribution [120], 
and (ii) L[f] (w) = \f{w + 1) — wf{w) for Poisson approximation, that is, if Z has the Poisson 
distribution with mean A V]. The operator L is not unique. It can be chosen to be the genera- 
tor of a Markov process whose stationary distribution is the approximating distribution ^{Z). 
This generator approach to Stein's method is due to Barbour [ , ]. 

Using Stein's method with L[f]{w) = f'{w) — wf{w), Diaconis [31] proved that 



sup 



P 



- (A + l)/2 
V(A + l)/4 



^ X 



which implies (8) since A = [log2 . Chen and Shao [19] refined Diaconis's proof to obtain 



sup 



P 



— Ao/2 



^ X 



6.2 

7T. 



where Aq := [logan]. 

In his book ['""], Stein considered bi- 
nomial approximation for and using 
the equation 

L[f]{w) = {k-w)f{w)-wf{w-l) 

(15) 

for / defined on {0, 1, k}, he obtained 



, Figure 9: q = 2: X 



31415926535897932384 



max 

e 



4 




see also [oj]. By using the generator ap- 
proach, Loh [84] extended the binary ex- 
pansion for Xn to g-ary expansion for any 
base g ^ 2, and proved that 



(The black smooth curve represents the Gaussian 
density with the same mean and the same variance.) 



3.3g3/2(^g- 1) 



[log„ n\ 



where is the g-dimensional random vector whose i-\h component is the number of the z-th 
digit in the g-ary expansion and Z ~ Multinomial ( [logg , . . . , 1/g) . 

Barbour and Chen [7] also used the generator approach to improve the error bound in the 
binary expansion case to 1/A if the approximating binomial distribution Faq is replaced by 
Binom(2e(n), |), where e(n) is the mean of X„ or by a mixture of 1ao-i with either Faq or 
yxo-2 chosen to have mean e(n). 
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1.3.4 Generating functions and analytic approach 



Schmid [113] derived, improving earlier results by Stolarsky [124] and by Schmidt [114], a 
very precise multidimensional local limit theorem of the form 

— # {m : ^ m < n, i'2{Kjra) = kj,j = 1, . . . ,d} 

exp f-^T^ (k - i log2 n) V-i (k - i log2 nY) ^^^^ 
(27rlog2n)'^/2det(V)V2 + U [[logn) ), 

where d ^ 1, the Kj's are odd integers > 1, k = {ki, . . . ,kd) and V is the positive-definite 
d X d matrix with entries 



'Jn P '.= 



:iKj,i^d). 



His proof builds on matrix generating functions and uses tools from Markov chains, following 
Stolarsky and Schmidt. In addition to providing optimal convergence rate for the corresponding 
multidimensional central limit theorem (derived in [ ]), his result implies very tight estimates 
for the distribution of V2{kn) — z/2(^^), a problem receiving much attention in the literature; see 
the recent paper [27], the Ph.D. Dissertation [122] and the references therein. 

In particular, (16) also leads to a local limit theorem for X„ with optimal rate when q = 2. 

Drmota and Gajdosik [ ] use generating functions and complex-analytic method to prove 
a local limit theorem for the sum-of-digits function in more general numeration systems; see 
the paper by Madritsch [87] and the references cited there for more recent developments. 



1.3.5 Other approaches 

We mentioned the result (11) by Dumont and Thomas [ ] for the central moments of X„, 
which implies the asymptotic normality of X„ by the Frechet-Shohat moment convergence 
theorem. 

The same method of moments was later applied by Bassily and Katai [ "] to derive the 
asymptotic normality of g-additive functionals; see also [52, 86, 87]. 
Drmota et al. [^ / ] obtained a functional limit theorem for fg(n). 



2 New results 

We will derive a few approximation theorems for the distribution of X„; different approaches 
will be developed, each having its own advantages and constraints. In particular, an expansion 
for a refined version of the total variation distance will be given, which will cover (1) as a 
special case. 

Here and throughout this paper, we consider only the case g = 2 for simplicity. The 
following notations will be consistently used. Let A = Ai = [logg^J. We then write n = 
Ei^jXs 2^^ with Al > ■ ■ ■ > A, ^ 0. Let Fa ~ Binom(A, 1/2). 

Let Hmix) denote the Hermite polynomials 

d™ 

Hmix) = (-l)"e-'/'-ri;: e-^'/' (m = 0, 1, . . . ). 
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Define a sequence {hm}hy 



hm:=^=l \H^{x)\e-''"/^dx (m = 0,l,...)- (17) 



27r 

Theorem 2.1. Let X„ denote the number ofl 's in the binary representation of a random integer, 
where each of the integers {0, 1, . . . , n—1} is chosen with equal probability P(X„ = m) = 1/n. 
Then 



/|^|a (n)| ^ 



/or m = 1,2,..., where the sequence ar{n) = ar{2n) is defined by (see ( 13)) 
anJ A denotes the difference operator 



Two explicit expressions for ar{n) are as follows. 

«^H=E yr^E(x....(x„-£+i)) 



for r = 0, 1, 

Taking m = 1 and dividing the left-hand side by 2, we obtain an asymptotic approximation 
to the total variation distance; see [20, 1 17] for similar results. 

Corollary 2.2. The total variation distance between the distribution of X„ and that of the 
binomial random variable Yx satisfies 

ciTv(^(X„), ^(Ia)) = ^'^i^"^^^^' + O ((logn)-i) , 

where the bounded periodic function F is defined by 

dA 

(19) 



F(a:)=2-5^2-^^(j-l-|) 



for X G (0, 1], when writing 2^ = J2j^o 2" ' ^ (1, 2] wiY/z = c/q < (ii < ■ ■ • , anJ F{x + 1) 
F(x) /or other values ofx. 
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Proof. Observe first that ao(^) = 1 and by the definition of F 

F(log2n)=ai(n)=E(X„)-^, 



which has the form 



F(log, ^) = ^ E 2-^'-'^^ 6 - 1 - ^) • (20) 



On the other hand, since Hi{x) = x, we then get hi = ^y2/■R. By considering the values of 

2^ = E 2-'^^ = 2-'^^ + 2-\ 

we see that F is continuous except at the end points (integers). □ 

By (20), we see that if Aj = A — 2(j — 1) for j = 1, . . . , s, then -F(log2 n) = 0. This yields 
the sequence |log2 (^o^j^k } ^'^^ locations of the zeros of \F{x) \ . 




0-2 { .4 0.6 0.8 1 



1 0.005 J J 

0.34 0.36 0.3S 0.40 0.42 0.395 0.400 0.405 0.410 0.415 0.420 

, F(x) . , F(a=) J 

1 0.0003 - I 

0.410 0.411 0.412 0.413 0.414 0.415 0.416 0-4140 0.4145 0.4150 



Figure 10: The fractal nature of the function \F{x)\. 
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Taking m = 2, we obtain a refined estimate with smaller errors. 



Corollary 2.3. 



V 27re log2 n 



(21) 



where F2{x) is defined for x E {0,1] by 

' dj{dj + 5) 



.Figure 11: \F2{x 



Fox) 



2-^ J2 2" 




2 2 



by writing 2^ = J2j^o 2 *^^ ^■^ above, and 
F2{x + 1) = F2{x) for other values of x 
(see Figure 11 for a plot). 

The two functions F{x) and ^2(0;) may assume the value zero when x is not an integer. 
This means that in such cases the error term is of a smaller order, and the right-hand side of 
our result gives simply an 0-estimate. One naturally wonders if there are other simple uniform 
approximants for the total variation distance? We propose a simple one in the following. 

Theorem 2.4. The total variation distance between the distribution of Xn and the binomial 
distribution ofYx satisfies 



dTv{^{Xn),^{Yx)) 



1 



2A-A2 



min < 1 



A- A2 



whenever X — X2 ^ c, where c> is sufficiently large. 



This result is similar to the estimate proved by Soon [117] (see also [20]), where he con- 
sidered the distance (iTv(=^(X„), ^(Yx+i)) instead of dTv{^{Xn), ^{Yx)) by using Stein's 
method. 

We see roughly that the wider the gap between A and A2, the smaller the total variation 
distance is. 

On the other hand, the theorem fails when c = 2. In this case, n = 2^ + 2^^^ and by (21) 
or by a direct calculation. 



More generally, if 



dTy{^{X^),^{Yx))-\~\ 



n = 2^ + 2^-2 + ■ ■ ■ + 2^-2'^ + no, 



(22) 



where ^ 1 and no = 0{n/X^/^), then F(log2 n) = 0{X-^^^), and (22) holds. 

All these results can be extended to z/g(n). The major difference is to use generating func- 
tion (12) instead of (13). 
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3 Proofs 



We first prove Theorem 2.1 by a direct analytic approach based on Fourier analysis. A closely 
connected semigroup approach (first developed by Deheuvels and Pfeifer for Poisson distribu- 
tion; see [28]), but relies on more algebraic formulations and manipulations, can also be used 
for the same purpose; see [110]. Then Theorem 2.4 is proved by two different approaches: one 
by Stein's method, and the other by a standard probability argument, which starts from decom- 
posing the distribution of X„ into a sum of binomial distributions. Our adaptation of Stein's 
method indeed leads to a refinement of Theorem 2.4, which will be given in Section 3.3. For 
more methodological interests, we also include another approach using the Krawtchouk poly- 
nomials and the Parseval identity, which is the binomial analogue of the Charlier-Parseval ap- 
proach developed earlier in detail in [ 1 iJ]. 



Approach 


Result 


Section 


Analytic 


Thm 2.1 (for extended (ixv) 


3.1 


Elementary Probability 


Thm 2.4 (for rfxv) 


3.2 


Stein's method 


Thm 2.4 (for rfxv) 


3.3 


Krawtchuk- Parseval 


X^-distance 


3.4 



Table 2: A summary of approaches used and results proved in this section. 



3.1 Analytic approach: proof of Theorem 2.1 

We now prove Theorem 2.1 and write the proof in a more general way that can be readily 
amended for dealing with other cases such as Gray codes; see Section 4.2 below. 

3.1.1 Probability generating function 

Let 



P„(y):=E(y^") = l 



n — ' 



Then 



P2n{y) = ^Pn{y), (23) 



and P2k{y) = (1 + 2/)V2^ Note that z/2(n) ^ A + 1. 
For convenience, let 

Qniy) = nPniy). 
In terms of Qn, the relation (23) has the form 

Q2n{y) = {i + y)Qn{y). 

For odd numbers, we have 

Q2n+i{y) = {I + y)Qn{y) + y"'^''^ . 
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These two recurrences can be written as 

Qn{y) = (1 + y)Qvn/2\{y) + ^ny^^^^n/Sj)^ 

for all ^ 0, where 



1- -1 



By iteration, we then get 



Qn{y)= Yl hn/2.iy''<\-''^''"^\i + yy 

= (1 + 1/) 2^ ^[2{l°g2"}+jJ " 



(24) 



for any n ^ 1; compare (13). This means that P„ has the form 



Pn{y) 



1 + y 



4>n{y), 



where 



2^ . yP^ 



and are nonnegative integers such that ^ j and |5j | ^ 1. 
3.1.2 Local expansion of [y) 

The approach we use here relies on the intuition that if 0„ is sufficiently "smooth" then X„ is 
close to the binomial distribution Y^. More precisely, let 

K{y) = Y(^A^){y-^y\ 

cf. (18). 

Lemma 3.1. For each m ^ 1, we have 



0^r<m 

if\y — l\ ^ 1/2 — 6, 6 > being an arbitrarily small number. 
Proof. We indeed prove a stronger estimate 



3 2"(2|y-l|)"- 
2' n{l-2\y-l\y 



(25) 



ft 

-|a,(n)K3-2 



r-l 



for all r ^ 1, which then implies (25). 
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Let [y^]f{y) denote the coefficient of in the Taylor expansion of /. Since pj ^ j, we 
have, for r ^ 1, 



as required. 



n 

2^' 



[^1 E ^. 



(1 + w)P^ 



^ (2 + wy 



J>1 

1 + w 



\w 



l-2w 



□ 



3.1.3 An asymptotic expansion for P(X„ = k) 

Proposition 3.2. For all integer ^ r ^ A and each m ^ 1, we have 

0^r<m \ / \ / 

uniformly in k. 

Proof. By Cauchy's integral formula for the coefficient of an analytic function, we have 

1 



2ni 

1 / /"^/^ 
2^ 



y-'^-'PMdy 



-kit 



1/2 Jl/2^|t|^7r 



1 + ^ 



>n(e'*)dt 



=:/i + /2. 

Since e*/^ lies inside the circle \y — l\ = 1/2, we evaluate h by applying Lemma 3.1 and obtain 

-1/2 

-1/2 



+ 



1/2 
1/2 



1 + e' 



A 



l-e**rdt . 



The integral in the 0-term is then estimated as follows. 





1 + e'^ 


/-1/2 


2 



|sm2| dt 



A .1/2 

|l-e'*|-dt = 2™ / (cos I)' 

i-l/2 

Jo 

' 23'»/2r((^ + i)/2) \ 



(26) 
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Now substituting this estimate into the expression of Ji and using the relation 

A^'- (f) = ^ [ e-'^'^l + e'')\l - e''Y dt, (27) 



we obtain 

A 



l|"dt 



/ 23-/^r((m + l)/2) \ 

^ \^ X{m+l)/2 J 



On the other hand, since (by (13)) 



max |Pn(e^*)| ^ - V \l + e' 

V 2^^ exp f-l^") + - E 
.0(exp(-£^).A2-<-.. 



Choosing 



log 2 

Co 



log2 + l/(87r2)' 

so as to balance the two terms in the 0-symbol, we obtain 

max \Pn(e'')\ = O ( Xe-'o^ 

l/2s:|iK7r 

where 

log 2 



Cn 



^° l + 87r2 1og2' 
Thus 

1^ = f Ae-"o^ 



This proves the proposition. □ 
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3.1.4 Estimates for the differences of binomial coefficients 
Lemma 3.3. For r ^ 0, we have 



2"-" max 



A" 
A" 



O 



/23'^/2r((r + l)/2) 

I X{r+l)/2 



5^ (1 + (A-)) 



(28) 



where hr is defined in (17). 

Proof. By (27) and an analysis similar to that used in (26), we obtain 



max 

0€k<\ 



2A+r pTT 



27r 

2A+r /•! 



vr 



/TV 
(cos I) I sin II dt, 
-TT 

(l_t)(A-l)/2^('^-l)/2d^ 



o 



2^+3^'/2r((r + l)/2) 



A('^+i)/2 

For the proof of (28), we apply the standard saddle-point method and obtain 



2-. ^ 



A^ 



A; 



O^fc^A 



1 

2^ 



A:=A/2+xVA/2 
3;=o(Ai/e) 

2r/2 roo 



2 

2r+l 



^1 _ e^*)'^e-'='* dt 



27rA('^+i)/2 



/CO 
(-«t)''e-*'/2-"*dt 
-oo 



(1 + (A-)) 



/oo 
|if,(x)|e-"'/Ma;(l + 0(A-^)) 
-oo 



proving (28) by (17). 

Note that when r = 1, we have the closed form expression 



□ 



E 



A 



A 

LA/2J 



1. 



For higher values of r, a closed-form expression can be derived for X]o<fc<A l^'^(fc) I terms 
of the zeros of Krawtchouk polynomials; see [i iw] for r = 2. 



3.1.5 Proof of Theorem 2.1 

Proof. Applying Proposition 3.2, we get 

E 



O^/csSA 



P(X„ = fc)-2-^ ^ (-1)VHA,Q') 



2™r 



((m + 3)/2) \ 

A(m+l)/2 j • 



Note that the sum over all k for the terms corresponding to r = m + 1 is of order A 
Theorem 2. 1 then follows from (28). □ 

We will later formulate a simple framework of numeration systems for which the same type 
of results as X„ hold, using the same method of proofs. 
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3.2 Elementary probability approach: proof of Theorem 2.4 

A crucial observation that will be elaborated here is the fact that X„ is itself a mixture of 
binomial distributions. More precisely, by the decomposition of Katai and Mogyorodi (14), 

P(X„ = k)=^Y. 2'^P(>A, =k-j + l). (29) 

A direct probabilistic proof of the above relation is as follows. Suppose [/„ is a uniformly 
distributed of [0, n — 1], then by definition X„ = h'2{Un)- First, we have the relation 

=n (A; = 1,2,...). 
On the other hand, since z/2(2'' + j) = 1 + z/2(j) if ^ j < 2^, we also have 

We can now split the interval {0,l,...,n — 1} = IJj=o where Aq = [0, 2^*") and 



A, 



^ 2^^ J2 ) , (30) 



for 1 ^ j ^ s - 1. Clearly, P(f/„ E Aj) = 2^^+^ /n. We then obtain (29). 

We group in the following lemma a few simple properties of the total variation distances 
involving Yk, which will be needed later. 

Lemma 3.4. Let Yk be a binomial random variable with mean parameters k and 1/2. Then 

dTy{^m,^{Yk + 1)) = {k''/^) , 

dTv(^(n),^(n+i)) = irfTv(^(n),^(n + i)) = o [k-'/') , 

rfTv(^(n), ^(n+i + i))=0 ((j + i)k-'/') . 

Proof. Since P(Y'fc = j) increases monotonically in the interval from [O, [k/2\^ and decreases 
monotonically in the interval ( [fc/2j , fc] , we have 

ciTv(^(n),^(n + 1)) = 2P(n = [k/2\) = o (fc-^/^) . 

In a similar way, since P(n+i = j) = P(n + I = j) = (P(n = j - 1) + P(n = j))/2, 
where / is Bernoulli with parameter 1/2, we get 

ciTv(^(n),^(n+i)) = i = j) - = j - 1)| 

= irfTv(^(n),^(n + i)). 

This proves the lemma. □ 
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3.2.1 Proof of Theorem 2.4 when A - As ^ 

Consider first tlie case when A — A2 ^ a/A. By (29) and Lemma 3.4, we have 



^ i V 2^^c?tv(^(V'a, + J - l),^(n)) 



n 



<2 E 

A-A2 



A; 



O 



giving an upper bound for the total variation distance. 

To obtain a lower bound, we apply again (31) and Lemma 3.4. 

r? — 2^ 

2rfTv(^(X„),^(lA)) ^ f/Tv(^(lA, + l),^(n)) 

n 



>^ rfTv(^(lA, + l),^(n)) 



n 



(A2-A,+j) 



> "^d^y^^iY,^ + l),^(n)) + O ^2'^^^^ - 



n 



2^v^ 
Now, by Lemma 3.4, 

C?Tv(^(n, + 1),^(Fa)) = rfTv(^(lAj,^(>A)) +0(A-l/2)^ 

3.2.2 Proof of Theorem 2.4 when c ^ A - A2 ^ 

For c ^ A — A2 ^ "^A, where c > is sufficiently large, we have 

rfTv(^(>Aj, ^{Yx)) > P (n, ^ - P (1a ^ v^/2) 



v^/2 - A2/2 



/A^/2 

a; 



v^/2 - A/2 
v^/2 



(31) 



0(A 



-1/2^ 



$(v^-yA)+0(A-i/2) 



A- A2 

TT 
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for e > 0, by the central limit theorem of the binomial distribution (with rate). Combining the 
upper- and the lower-bounds, we get 



dTv(^(X„),^(lA)) 



ifc^A — VX and c is sufficiently large. 



A-A2 



3.2.3 Proof of Theorem 2.4 when (A - A2)/v^ ^ 00 

The lower bound becomes less precise if (A — A2) / VX — t- 00. In this case, we first observe that 
the total variation does not exceed 1; thus 



n 



Take C = (A - \2)/\^. We have 

dM^iYx. + 1), ^(n)) ^ P ^ ^ - J v^) - P (ia, + 1 ^ ^ - ^ v^^ 



> P 



p 



p 



A 



2 47 
C rT-\ ^( A2 



^ -V^l -P 



4 



Applying Chebyshev's inequality, we get 

1 ^ f/Tv(^(>A. + 1), ^(n)) ^1+0 

if C ^ 8. 

When A3 < A2 — 1, we have the lower bound 



ciTv(=Sf(X„),i?(lA)) ^ 



2^^dTv(^(yA. + 1),^(>a)) - 2^ 

rfTv(=Sf'(rA, + i),^(n))-i/2 



2A-A2 



1 

2A-A2+1 



(i + o(c-)). 
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On the other hand, when A3 = A2 — 1, we use (31) and get 

2ciTv(^(Xn), ^(n)) ^ - 5^ 1 (P(n, =i-l)- P(n = £)) 



n 



_|_ . . . _|_ 2^^= 

+2^3 (^^Yx, =i-2)-F{Yx = i))\ 



n 



(2^' + 2^') (P(>A2 = i-l)- P(1a = ^)) 

2A4+1 



n 

i>0 



n 



+2"« (p(n3 = ^ - 2) - p(rA. = ^ - 1)) I 

s^(i+or')). 

This completes the proof of Theorem 2.4. 

3.3 Stein's method: an alternative proof of Theorem 2.4 

The sum-of-digits function was among one of the first instances used to demonstrate the ef- 
fectiveness of Stein's method (see [31,121]) with an optimal approximation rate. This method 
centers on exploiting an equation that characterizes the limiting measure, which, in the case of 
binomial distribution, is given by (15) and can be derived in the following way. 

3.3.1 Stein's equation for binomial distribution 

Since is binomially distributed with parameters k and p G (0, 1), we see that the probabilities 
P(^fc = j) satisfy the difference equation 

+ l)P(n = J + 1) - jP(n = j)) = {P{k - j) - jg)P(n = j). (32) 

Following Stein's idea [120] for deriving the characteristic equation for the normal distribution 

E/'(^) = E^/(^) ifeC\R)), 

by using integration by parts, we consider the average 

qE{giYk) - giY, - 1))^ = q {dU) - " l))P(n = j), 

and apply summation by parts, which yields, by (32), 

qE{g{Yk) - g{Y, - 1))F, = g ^ ^7(j>(iP(n = j) - (j + l^Yk = j)) 

= J2 9ij)nYk=j)ijq-pik-j))nYk=j) 
= qEYkgiYk) - pE{k - Yk)giYk). 
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Thus the identity 

q¥yj,g{Yk - 1) = pE(A; - Yk)g{Yk) (33) 

holds for any function (7 : {0, 1, 2, . . . , A;} — )■ M. 
A simpler proof of (33) starts with the relation 

q{j + l)P(n, = J + 1) = p{k - j)P(n = j) (0 ^ J < A;), (34) 
multiply both sides by g{j), and sum over all indices j, giving rise to 

g 5^ (j + l)(7(j)P(n = J + 1) = p 5^ (fc - = j), 

which is nothing but (33). 

Conversely, if for some discrete random variable Z the identity 

¥.{qZg{Z-l)+p{Z-k)g{Z)) = Q 

holds for any function g{j), then the probabilities P(Z = j) satisfy the equation (34) as P(Yfe = 
j). Thus 

p(z = j) = F(n = j)- 

3.3.2 Binomial approximation 

In the special case when p = g = 1/2, we have 

E((n - k)g{Yk) + Ykg{Yk - 1)) = 0. (35) 

Thus we expect that the above quantity will be small for any random variable whose distribution 
is close to Binom(/i;, 1/2). Assume h : {0, 1, . . . , A} — )■ C is an arbitrary function. Let (7 be a 
solution to the recurrence relation 

{x - X)g{x) + xg{x - 1) = h{x) - E(/i(Fa)). (36) 

Then we can represent the difference of means as 

E(/i(X„)) - E(/i(Fa)) = E((X„ - \)g{X^) + X^giX^ - 1)) . 

By Stein's equation (36), the expectation on the right-hand side of the above identity will be 
zero if X„ were distributed according to binomial distribution i?(A, 1/2). Thus we expect that 
this quantity will be small if the distribution of X„ is close to i?(A, 1/2). 

Recalling that Aj is defined in (30), we see that P(X„ < a;|[/„ G Aj) = f(Yx^ < x). 

It follows that 

E {h{X^)) - E(/i(n)) = E((X„ - X)g{X^) + X^g{X^ - 1)) 

= Yl ^(Un e A,)E((X„ - \)g{Xr,) + Xr,g{X^ - 1)|?7„ G A^) 

04:j<S 

= E + J - 1) - + J - 1) 

+ [Y,^+j-l)g(Y,^+j-2)). 
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The term with j = 1 in the last sum is zero since Yx is binomially distributed. Hence, with 
gj{x) := g{x + j — 1), we then obtain 

E{h{X^))-EihiY,))= ^E((A,-A+j-l)(7,(n,) + (j-lk,(lA,-l)) 

+ E v^((^A,-A,)^,(nj + n,,^,(n,.-i)). 

But the second sum is identically zero by (35). It follows that 

E(/i(X„)) - E{h{Y,)) = J2 ^E((A, - A + J - l)g{Y,^ + J - 1) + (j - l)g{Yx^ +J - 2)), 

which can be alternatively rewritten as 

E{h{X^)) -E{h{Yx)) =E{Qig{Xr,)) +E{Q2g{X^-l)), (37) 

where Qi is a random variable taking value Aj — A + j — 1 if t/„ G Aj and (^2 takes value j — 1 
if f/„ G for 1 ^ j ^ s. Note that 

and, similarly, E(g2) = O (l/2^-^^). 

3.3.3 Solving the equation (x — X)g{x) + — 1) = /i(a;) — E{h{Yx)) 

Solving the equation (36) is equivalent to finding the solution x„i of the difference equation 

{k — rn)x„i ~ TnXjji_i = 6m, 

for 1 ^ m ^ k (note that and x„ do not affect the solution of this equation and therefore 
can be assumed to be equal to zero), where the 5m's are given and satisfy the condition 

The solution is obtained by introducing new variables Zm = {k — m)(^^Xm for which our 
difference equation takes form 



Iterating this, we obtain the following solution to Stein's equation (36). 

Lemma 3.5 ( [121]). Let Yx ~ Binom(A, 1/2). Define the function g : {0, 1, ■ ■ ■ , A — 1} — )■ 

by 
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Then g is the solution to the recurrence equation 

{x - \)g{x) + xg{x - 1) = h{x) - ^{h{Yx)), 
for a// X G {1, . . . , A — 1}. 
Note that 



^ \mJ m.<:'r<\ ^ ^ 



(A-m)(^) ^ 

Lemma 3.6. The sequence 

is monotonically increasing in m. 

Proof. By induction using the recurrence relation 

m + 1 
A — m 

and the monotonicity of fr^- □ 

Lemma 3.7 ( [84], [117]). IfO ^ /i(m) ^ 1, then the solution of Stein's equation provided by 
Lemma 3.5 satisfies the uniform estimate 

\g{m)\ = 0{X-'/') (m = 0,l,...). 
Proof If m ^ A/2, then, by the monotonicity of the sequence i/m, we obtain 



(A--)C)o^^™W A-m A-LA/2J 



(A - LA/2J)(l^/2j) 0^r^LA/2j 

LV2J(l.),j) ^ ^ 

The case when m > LA/2J is treated similarly. Indeed, if m > LA/2J, then, using the identity 



we have 



1 (^\ _ y^-m-i ^ yX-lX/2\ 



^ ^^IT^ = O (A-1/2) . 



□ 
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3.3.4 Proof of Theorem 2.4 by Stein's method 

Assume now that A C M is an arbitrary set. Define 

h( \ T ( \ / 1, if ?Ti G A; 
n[m) := 1a[^) = { -r ^ a 
^ ' ' \ 0, if m ^ A. 

Then 

P{Xn eA)-P{Y^eA)= E(gi(7(X„)) + E(g2^7(X„ - 1)) 

= o 



= o 

Thus 



E(gi) + E(g2; 
A-A2 



dTvi^iXn),^iYx))=0 



A-A2 



3.3.5 A refinement of Theorem 2.4 

A finer result can be obtained by using the following lemma. 
Lemma 3.8 ( [<:>]). IfO ^ h{x) ^ 1 and g is defined in (3.5), then 

max \gij)-gij - 1)| ^ 2min j-, -^l ^ ^. 



Proof. If m ^ A/2, then 
g{m)-g{m-l) 



+ 



him)-E{hiYx)) 



A — m 

By the elementary inequality 



A \ / m \ ^ X 
m — r ' 



A — m + 1 / Vm/ ' 



we see that 



1 1 ^ ei 

(■^) ' A - m 

O^r^m— 1 Vm/ 

^ X — 2m / m A^ 1 



m(X — m) ^ \A — m + 1/ A — m 
A - 2m 



^ A— »n+l 



m(A — m) 1 — , , -I A — m 

\ ' A— m+l 

A - 2m 1 

+ 



(A — m)(A — 2m + 1) A — m 
2 



A — m 
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In a similar way we obtain the estimate 

2 

\g{m) -g{m- 1)| ^ — , 

m 

in the case when m > A/2. □ □ 

The following result is similar in nature to that obtained by Soon [ ] for unbounded 
function h{x) that he later applied to derive several large and moderate deviations results for 

Proposition 3.9. Assume h is any real function such that ^ h{x) ^ 1. Then 

E{h{X^)) - E{h{Y,)) = 4ai(n)E (^h{Y,) ^^^ ~ + O ^ ' 
where ai{n) = -F(log2 n) is defined in (20). 



A / V 2^-^2A 



Proof. The lemma implies that g(x + j — 1) = g{x) + 0{j/k). Since Yk+s has the same 
distribution as Yk + Ws, where Wg is independently and binomially distributed Ws ~ B(s, 1/2), 
we can replace the mean E,{g{Yk+s)) by ¥.{g(Yk + Ws)), the error so introduced being bounded 
above by 

Hoiyk+s)) - HgiYk)) = HoiYk + Ws) - E{g{Yk))) = o (^) , 
where we used the estimate \ Ws\ ^ s. Thus 

¥.{h{X^))-E{h{Y,))= J2 ^mXj-X+J-l)9{Yx,+J-l) + {j-l)g{Yx^+J-2)) 



(A - X2)' 



2a^in)EigiY,)) + O , ^a-,^^ 
We now evaluate the quantity E{g{Yx)) appearing in the last expression 

= i E E (^) (MO - E(Mn))) 

0^m<A ^ ^ V '"''Km) O^rsim ^ ' 

-^E E 

= E({h{Y,)-E{h{Yx))) T^) 



E[(Mn)-E(Mn))) [ 5^ ^ 



E 



m ^ X — m 

X/2^m<X 
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since E(/i(Fa) — ^h{Yx)) = and the sum X]A/2<m<A 
Yx. Then 



(A — m) Ms a constant independent of 

E{g{Yx)) = - J2 nyx = r){h{r)-E{h{Yx))) 



m=A/2 



A — m ' 



where we use the convention that J2m=a ^ ~ X]m=b- '-h^'^ ^P^^*^ '^h^ ^"^"^ ^^^'^ ^"^^ parts and 
obtain 



|r-A/2|s£A3/4 m=A/2 

If A/2 ^ r ^ A/2 + A3/4, then 

1 



1 



A — m VA — m A/2 

m=A/2 m=A/2 ^ ' 



1 A ^ r - rA/21 + 1 



A/2 



^-V2 >A m - A/2 
"^^^/2(^^^^ ^ ^ 

The same estimate holds when r lies in the range A/2 — A^/^ ^ r ^ A/2. Thus 

E{g{Yx)) = 2E - E(/i(r,))) ^^^y^) + O (^^E| - E(/i(n)) | (F, - A/2)' 

+ 0(A-i+P(|Fa-A/2|>A3/^)) 



2E (/i(rA)-E(/i(rA))) 



A/2 - Ya 
A 



+ 0(-E(rA-A/2)^)+0(A-^) 



2E ( {h{Yx) - E{h{Yx))) ^11^ ] + 0(A-i^ 



A 

= 2E (Mn)^^^)+0(A-^), 
since E (A/2 — Fa) = 0. This proves the proposition. 



□ 



3.3.6 Corollaries of Proposition 3.9 
Corollary 3.10. We have 

dTY{^{Xn),^{Yx)) = \ai{n)\E 



Yx - A/2 



A/2 



+ 



(A-A2)- 
2A-A2A 



(38) 



Proof. By the definition of the total variation distance 



dTy{^{Xr,),^iYx)) = SUp|E(/l(X„)) -E(/l(rA))| 

h 
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where the supremum is taken over all functions h assuming only binary values {0, 1}. It is easy 
to see that the supremum of the average containing h in the above relation is reached by the 
function 

'l, if a; ^ A/2, 
0, if a; > A/2, 



h{x) 



and we thus get, by Proposition 3.9, the estimate 



rfTv(^(^n),^(>A)) = |aiH|E 



Yx - A/2 



A/2 



O 



□ 



Corollary 3.11. For all c ^ X — X2 with c large enough, we have 



|Qi(n)| 



1 + 



A- As 



Proof. This follows from the estimate (38) because 

Yx - A/2 



E 



VX/2 



1/2N 



and the quantity |ai(n) | can be bounded from bellow by 

A-A2 



2A-A2 



0{ai{n)), 



if A — A2 > c with c > large enough. 



□ 



Remark. The Stein method we adopted above for the analysis of sum-of-digits function differs 
from the original approach by Stein [ ]. First, he used 1^+1 in lieu of Yx, as a good approxi- 
mation to Xn, and derived a bound for the point metric. Second, instead of exploiting the fact 
that Xn is a mixture of binomial distributions his analysis is more subtle and is based on the 
construction of an exchangeable pair. By doing so he managed to simplify the right-hand side 
of (36) to 

E{h{Xn)) - E(/i(Fa+i)) = E((X„ - (A + l))9h{Xn) + X^gniXn - 1)) = E{Qgh{Xn)), 

(39) 

where Q is a random variable such that ^ E(Q) ^ 2 and is the solution to the recurrence 
equation 

hix) - E(/i(Fa+i)) = (x - (A + l))gix) + xgix - 1), 

for X E {0, 1, . . . , A} whose precise expression is given by Lemma 3.5 with k = \ + 1. 
The estimate of Lemma 3.7 together with the property ^ E((5) ^ 2 now immediately 
give the estimate of the total variation distance d-ry{^{Xn), ^(Yx+i)) = 0(A^^/^). Further 
applications of similar ideas will be explored elsewhere. 
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3.4 The Krawtchouk-Parseval approach: -distance 

We develop in this section yet another approach based on properties of the Krawtchouk polyno- 
mials and the Parseval identity. The approach is the binomial analogue of the Charlier-Parseval 
approach we developed and explored earlier in [13-!]. We consider only the simplest case of 
deriving the x^-distance, leaving the extension to other distances to the interested reader, which 
follows readily from the framework developed in [ i j-h]. 

3.4.1 Krawtchouk polynomials 

We start with reviewing the definition of Krawtchouk polynomials and some of their well- 
known properties (see [125, pp. 35-37]). 

Assume that p and q are nonnegative integers such that p + q = 1. 

Introduce the notation 




The Krawtchouk polynomials Knit) = Kn{N, t) are defined by 




(40) 



Multiplying both sides by B{N, t)z^ and summing over all t from to A^, we obtain 



^ B{N, t)z^ ^ Kj{x)w^ = {pz{l + qw) + g(l - pw)) 



Taking the coefficients of u'" on both sides, we get 





N-n 



On the other hand, by (40), we have 




= J2 B{N,t){{l + qw){l + qz)y{{l-pw){l-pw)) 



N-x 



O^t^iN 



= + qw){l + qz) + g(l — pw){l — pw)) 
= (1 + pqzw)^ . 



Accordingly, we obtain the orthogonality relation 



O^t^N 




34 



3.4.2 The Parseval identity for Krawtchouk polynomials 

Let F[z) be a polynomial of degree not greater than N. Thus 

f{z) = J2 /*^*' 



and we have the expansion 



ft 



(41) 



Taking square of the above identity, multiplying it by B(N, t) and summing the resulting iden- 
tity with respect to t, we obtain 



By the definition (40), we deduce that 



^ ^ , 1 + qw 



1 — pw 



j=o ^ ^ 



Comparing this identity with (42), we conclude that 



E 



ft 



B{N,t) 



B{N,t)= J2 



where c, is defined by 



[i-pwrf 



I 1 + qw 



1 — pw 



Now by the Parseval identity 

1 r 

J(f,N;r):=- / 



{1-pre'Tf 



u^Nfl 1 + ire'' 
1 — pre** 



dt 



E 



1 2 27 

c,| r ^. 



Comparing (43) with (42), and using the relation 



/o (1 + w 

we obtain the Krawtchouk-Parseval identity 



\N+2 



N 



E 



B{N,t) 



B{N,t) = {N + 1) 



dn, 



(42) 



(43) 



(44) 



which is crucial for deriving the asymptotics of the x^-distance. 
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3.4.3 The -distance 

For any non-negative integer-valued random variables Z and W , the -distance is defined by 

2 



provided that the series on the right-hand side has a meaning. It possesses two important 
properties. First, its square root upper-bounds the total variation distance 

1 



Second, it also provides an effective upper bound for the Kullback-Leibner divergence (or 
information divergence) 

d^{^{Z\^{W)) := P(Z = j) log ^ x\^{Z\^{W)\ 

a very useful measure in information theory and related applications. 

Theorem 3.12. The -distance between the distribution of Xn and the binomial distribution 
Y\ satisfies 

X^(^(X„),^(n)) = 0(A-^). 



1 I \ ^ 

1 + 2;^ 



Proof. Let 

f{z) := P„(z) - 
Then, by (13), 

We need the elementary inequality 

1(1 + zYil -zf-l\^{l + \z\Y+^ - 1 ^ (a + 6)1^1(1 + \z\f+^-^, 

for nonnegative integers a, h with a + h ^ 1 . Applying this inequality, we get, with p = q = 1/2 
and = A, 

2 
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Substituting this estimate into the Krawtchouk-Parseval identity (44), we have 



E 



P(X„ = m) 

2^ \m/ 



2 \ 1/2 

1 ^y 



2-^ \ m 



2s:i^s 



^( /,A;2v/^) 
(A + 1) 



1/2 



A,- +3 



[1 + 

A-A,-l 



(1 + u)^^+^ 



1/2 



du 



Since the function + u) reaches the maximum at the point m = 1, we see that 

1 + 2^^/(1 + m) ^ 2. 

It follows that 



1/2 ^ ^ A - Xj 



2(A-A,+l)/2 



2^i^s 



(A + 1) 



m2 



(l + u) 



A, +3 



du 



1/2 



2^i^s 



A + 1 



2(A-A,+l)/2 ^(^^. + 2)(A, + 1) 

A-A. 



1/2 



2(^-^^+i)/2(l + A,) 



It is clear that 



This proves the theorem. 



E 

A-A2^A;<A 



= E 

\ + 1 ^ 

fc(A + l) 
2^/2(1 + A - A:) 



A;(A + 1) 



0(1) 



□ 



Finer results can be derived by developing similar techniques as those used in [ ] for 
Poisson approximation. 



4 A general numeration system and applications 

The properties we studied above can be readily extended to a more general framework of nu- 
meration system in which we encode each integer by a different binary string and impose the 
sole condition that 

Z2n = Z^ + I (n^l), (45) 
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where Z„ denotes the number of I's in the resulting coding string for a random integer, assum- 
ing that each of the first n nonnegative integers is equally likely, and / ~ Bernoulli (1/2). This 
simple scheme covers in particular the binary coding of X„ above (as can be easily checked) 
and binary reflected Gray code, which will be discussed in more detail later. Let fi(n) de- 
note the number of I's in the coding of n in such a numeration system. All our results below 
roughly say that this numeration system does not differ much from the binary coding although 
the codings inside each 2*^ block can be rather flexible. 

Theorem 4.1 (Local limit theorem). Assume that Zn satisfies (45). Then is asymptotically 
normally distributed 



P Zr, 



h X 

2 2 



1 + 



1 + \x\ 



(46) 



uniformly for x = o(A^/^), with mean and variance satisfying 

\0g2n 



E(Z^ 



2 



+ G'i(log2n), 
+ G'2(log2^). 



(47) 



Here Gi, G2 are bounded periodic functions. 



For results related to moderate deviations, see [ ]. We can derive more precise Fourier 
expansions for the periodic functions Gi, G2 when more information is available. 

Theorem 4.2. Assume that Zn satisfies (45). Then 

E 



for m = 1, 2, . . . , where the sequence hrin) = br{2n) is defined by (see ( 13)) 



(48) 



In particular, 60 = 1^ bi{n) = E(Z„) — A/2, and 

A + 1 



A(A + 1) 



Corollary 4.3. 



dTy{^{Zn),^{Yx)) 



V2|Gl(l0g2^)l 

a/tt loga n 



O 



logn 



(49) 



where G'i(log2n) = E(Z„) — A/2 is periodic Gi{x + 1) = Gi{x) and continuous on the set 
M\N. 
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The periodic function Gi{x) can be defined as follows. Write 2^ = J2j^o 0^ ^ [1; 2). 
Note that Gi{x) = Gi{x) - {x}/2. 

Theorem 4.4. Assume, as above, that n = 2^ + 2^'^ -\ h 2^" with A > A2 > ■ • • > ^ 0. 

Then 



2A^A, ^ ' ^ 
whenever A — A2 ^ c, where c is sufficiently large. 

4.1 Sketches of proofs 

Most of our analysis is based on the following explicit expression; cf. (13). 

Lemma 4.5. If satisfies the condition (45), then the probability generating function of Z„ 
satisfies 



E(/") = - E y'^^^'^'^'^-'Xi + yr', (50) 



n 

where n = 2^ + 2^^ -{ h 2^= with A > A2 > ■ ■ ■ > A^ ^ 0. 

Proof. Observe that the crucial condition (45) implies the recurrence 

the same as (23) for X„. Consequently, we also have, following the same analysis there, 

n 



forn ^ 1; compare (24). Since 2 divides [n/2^ if and only if j ^ {Ai, A2, . • • , A^}, we obtain 
(50). □ 

From the expression (50), we easily obtain 



where fij := fi{\n/2'^^ \ — 1). The identity for the mean in (47) then follows with 
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which is periodic and bounded since fij ^ j + 1. Also the definition of Gi here for logg n can 
be readily extended to all reals. 

We now prove the identity in (47) for the variance as the proof is very simple. Let 



Zn '■— Zr, 

2 n 



A 1 V- / A- A 



Then 



A f^,^. XV A 



V(Z„)--=E(Z^)-(E(Z„) + - 



1 A ^ , + A^ + A , A(A-A,)\ ,^,= 



and we obtain the identity for the variance in (47) with 



G2{log,-) = ^ E 2-^^-^^^ - (A - A,) + (l^M^^^h^^ 

{log2 n}2 + {log2 n} 



G'i(log2n)^ - G'i(log2ra){log2ra} - 



which is also bounded and periodic, and extendible to all a; e M. 

The local limit theorem (46) is proved in a way similar to the proof of Proposition 3.2. 

In terms of probabilities, the identity (50) means that the random variable Zn can be ex- 
pressed as a mixture of shifted binomial random variables. Its distribution can be described in 
the following way. Let (n be a random variable defined by 

n 

Then 

nZneA\Cn = j) = nYx,+r,eA), 

for any A C M, where rj := /i( [n/2'^j j — 1) ^ A — Aj + 1 and tq := 0. By the same arguments 
used above, we see that the identity 

E (/z(/x(X„))) = J2 - A + rMYx, + rj) + r,g{Yy^ + r, - 1)) 

holds for any function /i : R M, where g is the solution to Stein's equation (36). 
We skip all details of the proofs as they are almost identical to those for X„. 



4.2 Gray code 

The Gray code is characterized by the property that the codings of any two successive inte- 
gers differ by exactly one bit. It is named after Frank Gray's 1947 patent, although the same 
construction had been used in telegraphy in the late nineteenth century by the French engineer 
Emile Baudot; see Wikipedia's page on Gray code for more information. The coding notion 
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that two neighboring objects differing at one location has turned out to be extremely useful in 
many scientific disciplines beyond the original communication motivations such as experimen- 
tal designs, job scheduling in computer systems, and combinatorial generation; see the survey 
paper [ ] and the references therein. 

The binary reflected Gray code is constructed by reflecting (or mirroring) the first 2^ cod- 
ings of the first 2^^ nonnegative integers and then adding 1 at the beginning for each coding, 
resulting in the Gray code for the first 2^^+^ nonnegative integers; see Figure 12 for an illustra- 
tion. 





1 



Gra}- 




Reflect 




1 3 








1 

1 1 



1 

1 1 






1 1 1 

1 1 
1 



Figure 12: Constructions of binary code (left) and Gray code (middle), 
and the Gray code of the first few integers (right). 

By construction, the Gray code, say ^(2^ + j) of 2^ + j with ^ j < 2^ is equal to 
10^^(2^ — 1 — j) (string concatenation), where i := k — 1 — [log2(2'^ — 1 — j)J and 0^ means 
written i times. For example, 

g{i9) = ig{i2) = iiog{3) = iioio. 

Thus the number of I's, denoted by ^(n), of n 
under such a coding system satisfies the recur- 
rence 



7(2'=+j) = 1 + 7(2* 



(51) 



for ^ J < 2^^ and k ^ 1. Another interesting 
type of recurrence is (by induction) 



7(n)=7(K2j) + - 
for ^ 1, in contrast to 

u{n) = z/([n/2j) + 

for binary coding. 
Let now 



(_1)K21 




n 



RJz) 



Os;j<n 



7(i) 
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Then, obviously, 
and, by (51), 



R2k [z) 



+ 



R2k+j{z) = R2k{z) + z{R2k{z) - R2k_j{z)) 
= R2k{z){l + z) — zR2k_j{z) 



:i + z 



zR2k_Az). 



From this recurrence relation, we deduce by induction that 



R2n{z) 



'l + z)RJz) 



for all n ^ 1. Thus the the sum-of-digits function Z„ of random integers under the Gray cod- 
ing satisfies (45), and thus Theorems 46, 4.2 and 4.4 and Corollary 4.3 all hold. In addition to 
the mean and the variance, all results are new. The mean of Zn was first studied by Flajolet 
and Ramshaw [46] and more precise characterizations of Gi (including a Fourier series expan- 
sion) are given. A closed-form expression for E(?/^") was derived by Kobayashi et al. [76] by 
singular measures, together with exact expressions for all moments (non-centered). 



^ Figure 14: Gi{x 




, Figure 15: G2{x 




For other properties related to 7(n) and Z„, see [34,64,73,74,76,77, 106, 108]. 

So far, we considered only the goodness of approximations to ^(X„) and =Sf (Z„) by the 
binomial distribution Yx. It is also natural to consider approximations of ^(Z„) by ^(X„), 
and the result is as follows. 



v^|F(log2r2)-G(log2n)| 



+ 



where the difference G{x) — F{x) is a continuous function for all x. 



(52) 



4.3 Beyond binary and Gray codings 

We give here another simple binary coding system for integers satisfying the condition (45). We 
start with the observation that binary coding can be constructed not only in the usual translation 
way, but also using reflection and complement (first reflect the whole block of 2*^ numbers as 
in Gray code, and then change every 1 to and every to 1); see Figure 16. 
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y(2') 





1 




Reflect ■ 



Complement 
"(0 ^ 1;1 0) 




Figure 16: Two different ways of constructing the same binary code 

We now consider a coding system us- 
ing translation and complement. Let 
denote the number of I's in the coding of 
n. Then by construction 
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> 1 



From this 



TransComplm (2^ ) 



Translate 



5Z 



y) 



Complement 



TransComplm(2'' 



Figure 1 7: Yet another code 
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and it is straightforward to see that (45) 
holds in such a coding system. Thus Zn 
satisfies all properties stated in the begin- 
ning of this section. 

To obtain other examples for which (45) holds, one may combine more block operations 
(such as translation, horizontal or vertical reflection, reversal, flip, etc.) and string operations 
(complement, reversal, cyclic rotation, rewriting, etc.). A simple example is the block trans- 
lation or reflection followed by any cyclic rotation of each coding (which does not change the 
number of I's). Such a coding scheme also satisfies (45). 
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